Nobel glyphosate-tolerant 5-enolpyruvylshikimate-3-phosphate synthase and gene encoding it

ABSTRACT

The present invention relates to a novel 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS). It is highly tolerant to glyphosate, the competitive inhibitor of the substrate phosphoenolpyruvate (PEP). The invention also relates to a gene encoding the synthase, a construct and a vector comprising said gene, and a host cell transformed with said construct or vector.

TECHNICAL FIELD

The present invention relates to a novel glyphosate-tolerant5-enolpyrulshikimate-3-phosphate synthase (EPSPS), an isolated nucleicacid sequence encoding the synthase, a nucleic acid construct comprisingsaid sequence or the coding region, a vector carrying said sequence orthe coding region or said nucleic acid construct, and a host celltransformed with said construct or vector.

BACKGROUND

The 5-enolpyrulshikimate-3-phosphate synthase (EPSPS) is a key enzymeinvolved in the aromatic amino acid synthesis pathway in plants andbacteria. Glyphosate, which is also refered to as N-phosphylmethylglycine, is a broad-spectrum, highly efficient post-sprouting herbicide.Glyphosate is a competitive inhibitor of phosphoenolpyruvate (PEP),which is one of the substrates of EPSPS. Glyphosate block the conversionof PEP and 3-phosphate-shikimate to 5-enolpyrul 3-phosphate-shikimatecatalyzed by EPSPS, thereby block the synthesis pathway of shikimicacid, a precursor for the synthesis of aromatic amino acids, and lead tothe death of plants and bacteria.

The glyphosate tolerance of a plant may be obtained by stablyintroducing a gene encoding glyphosate-tolerant EPSPS to the plantgenome. There are mainly two classes of known glyphosate-tolerant EPSPSgenes: Class I (see, e.g. U.S. Pat. No. 4,971,908; U.S. Pat. No.5,310,667; U.S. Pat. No. 5,866,775) and Class II (see, e.g. U.S. Pat.No. 5,627,061; U.S. Pat. No. 5,633,435). These genes have beensuccessfully introduced into plant genomes, and the glyphosate-tolerantplant cells and plants are obtained.

The invention is aimed to find a novel EPSPS of native sequence that istolerant to glyphosate.

SUMMARY

An object of the invention is to provide a newly isolated nucleic acidsequence encoding glyphosate tolerant EPSPS protein.

A further object of the invention is to provide a novelglyphosate-tolerant EPSPS protein.

A further object of the invention is to provide a nucleic acidconstruct, formed by operably linking the above-said nucleic acidsequence to a control sequence essential for expressing said nucleicacid sequence in a selected host cell. Specifically, the controlsequence includes optionally a promoter, an enhancer, a leader sequence,a polyadenylation signal, and the initiation and termination sequencesfor transcription and translation.

A further object of the invention is to provide a vector comprising theabove-said nucleic acid sequence or nucleic acid construct.

A further object of the invention is to provide a host cell transformedwith the above-said construct or vector. Said host cell may express theprotein encoded by the above-said nucleic acid sequence under anappropriate condition, and enables said protein to exhibite enzymaticEPSPS activity and glyphosate tolerance, thereby the host cell obtainsthe glyphosate tolerance.

A further object of the invention is to provide the above-said host celland progeny cells hereof. Said cells contain the aforesaid nucleic acidsequence or the coding region, or the nucleic acid construct or vector,and are glyphosate-tolerant.

The other purposes of the invention are illustrated in the followingdescription and examples.

DISCLOSURE OF THE INVENTION

The invention provides a novel glyphosate-tolerant5-enolpyrulshikimate-3-phosphate synthase (EPSPS) having a nativesequence. The term “native sequence” denotes a sequence which is notmodified by mutagenesis, or by biological or chemical modifications,such as genetic engineering. The invention provides an isolated aminoacid sequence of said EPSPS (SEQ ID NO: 3). Any amino acid sequencewhich is modified by deletion, addition and/or substitution of one ormore amino acid residues in SEQ ID NO: 3 is included in the scope of theinvention, provided that the modified sequence encodes a protein ofEPSPS activity and glyphosate tolerance.

The invention further provides an isolated nucleic acid sequenceencoding said EPSPS (SEQ ID NO:2, in particular the coding region). Anynucleic acid sequence which is modified by deletion, addition and/orsubstitution of one or more nucleotides in SEQ ID NO: 2 is included inthe scope of the invention, provided that the modified sequence encodesa protein of EPSPS activity and glyphosate tolerance.

The nucleic acid construct of the invention is constructed by operablylinking the EPSPS-encoding nucleic acid sequence of the invention withother homologous or heterologous sequence.

According to the method of the invention, the nucleic acid construct orthe isolated nucleic acid sequence of the invention is incorporated intoa vector, and a selected host cell is transformed with said vector. TheEPSPS enzyme of the invention is expressed. The recombinant host cell isthus conferred with a tolerance to glyphosate. Alternatively, instead oftransformation with a vector, the isolated nucleic acid sequence or thenucleic acid construct of the invention is introduced into a host celldirectly by conventional methods such as electroporation. The EPSPSenzyme of the invention is expressed and the host cell is conferred withglyphosate tolerance. The vector and the recombinant host cell thusobtained, as well as the method for obtaining the cell are included inthe scope of the invention.

DEFINITIONS

The term “percent (%) sequence homology” used herein refers to thepercentage of amino acid residues in a candidate sequence that areidentical with the amino acid residues in the target sequence, afteraligning the sequences and introducing gaps, if necessary, to achievethe maximum percent sequence homology, and not considering anyconservative substitutions as parts of the sequence homology. Thesequences herein include amino acid sequences and nucleotide sequences.The determination of percent (%) sequence homology may be achieved invarious ways that are within the skill in the art, for example, usingpublicly available computer software such as BLAST, BLAST-2, ALIGN,ALIGN-2 or Megalign (DNASTAR). Those skilled in the art can determineappropriate parameters for measuring alignment, including any algorithmneeded to achieve maximal alignment over the full-length of thesequences being compared.

The term “nucleic acid construct” used herein refers to asingle-stranded or double-stranded nucleic acid molecule, which isisolated from a native gene, or is modified to combine nucleic acidfragments in a manner not existing in nature. When the nucleic acidconstruct contains all the control sequences required for expressing theEPSPS of the invention, the term “nucleic acid construct” is synonymousto the term “expression cassette”.

The term “control sequence” used herein comprises all components, whichare essential or advantageous for the expression of a polypeptide of thepresent invention. Each control sequence may be native or foreign to thenucleic acid sequence encoding said polypeptide. Such control sequencesinclude, but are not limited to, a leader sequence, polyadenylationsequence, a propeptide sequence, promoter, and transcription terminator.The control sequences include at least a promoter and terminationsignals for both transcription and translation. A control sequence maybe provided with a linker to introduce specific restriction sitesfacilitating the ligation of the control sequence with the coding regionof the nucleic acid sequence encoding a heterologous polypeptide.

The term “operably linking” denotes the linkage of the isolate nucleicacid sequence of the invention to any other sequences homologous orheterologous to said sequence, so that they together encode a product.When necessary, said sequences may be separated by methods such asrestriction digest. The homologous or heterologous sequence as disclosedherein could be any sequence, such as any control sequence, that directsthe expression of the isolated nucleic acid sequence of the invention ina selected host cell, or the sequence coding for a fusion proteintogether with the isolated nucleic acid sequence of the invention.

The term “host cell” used herein refers to any cell capable of receivingthe isolated nucleic acid sequence of the invention, or receiving theconstruct or vector comprising said sequence, and keeping them stabletherein. The host cell will obtain the feature determined by theisolated nucleic acid sequence of the invention when the cell containssaid sequence.

DETAILED DESCRIPTION

The inventors surprisingly discover and isolate a novel glyphosatetolerant EPSPS coding gene. The coding sequence of said gene (SEQ IDNO:2) and the coding region (CDS, the nucleotides 574-1803) aredisclosed herein. The amino acid sequence (SEQ ID NO:3) encoded by saidCDS is also disclosed herein. DNAMAN version 4.0 is used with a CLUSTALformat to align the amino acid sequence as shown in SEQ ID NO:3 with theknown sequences of EPSPS Classes I and II. It is found that the sequenceof the invention does not comprise any sequence claimed by precedingpatents (see FIG. 2). BLAST search in GenBank protein sequence bankshows that the SEQ ID NO:3 of the invention is 37% homologous with theEPSPS amino acid sequence derived from Clostridium acetobutylicum, andis 20% homologous with the EPSPS amino acid sequence derived from E.coli(see FIG. 2). NCBI-BLAST search indicates that the sequence as shown innucleotides 574-1803 of SEQ ID NO:2 is not found to be homologous withany known nucleic acid sequence. Therefore, the nucleic acid sequenceand the protein described in the invention are novel.

Hence, one aspect of the invention relates to an isolated nucleic acidsequence, which comprises the nucleic acid sequence shown in SEQ ID NO:2and encodes a glyphosate tolerant EPSPS.

The invention also relates to an isolated nucleic acid sequence as shownin nucleotides 574-1803 of SEQ ID NO:2, which encodes a glyphosatetolerant EPSPS.

The invention also relates to a nucleic acid sequence obtained bymodifing one or more nucleotides or by deleted and/or added 3 or amultiple of 3 nucleotides in SEQ ID NO:2 or in the sequence ofnucleotides 574-1803 of SEQ ID NO:2, and said nucleotide sequence iscapable of coding a protein with the activity of5-enolpyrulshikimate-3-phosphate synthase (EPSPS) and the glyphosatetolerance. The modification, deletion and addition of nucleotides areconventional within the skills in the art.

The invention also relates to a nucleotide sequence which is of thehomology, such as the homology of at least about 65%, preferably atleast about 66%, preferably at least about 67%, preferably at leastabout 68%, preferably at least about 69%, preferably at least about 70%,preferably at least about 71%, preferably at least about 72%, preferablyat least about 73%, preferably at least about 74%, preferably at leastabout 75%, preferably at least about 76%, preferably at least about 77%,preferably at least about 78%, preferably at least about 79%, morepreferably at least about 80%, more preferably at least about 81%, morepreferably at least about 82%, more preferably at least about 83%, morepreferably at least about 84%, more preferably at least about 85%, morepreferably at least about 86%, more preferably at least about 87%, morepreferably at least about 88%, more preferably at least about 89%, morepreferably at least about 90%, more preferably at least about 91%, morepreferably at least about 92%, more preferably at least about 93%, morepreferably at least about 94%, more preferably at least about 95%, morepreferably at least about 96%, more preferably at least about 97%, morepreferably at least about 98%, most preferably at least about 99%, withthe nucleic acid sequence defined by SEQ ID NO:2 or nucleotides 574-1803of SEQ ID NO:2. Such a sequence is within the scope of the inventionprovided that it encodes a protein having the glyphosate tolerant EPSPSactivity.

The isolated nucleic acid sequence of the invention may be cloned fromnature with methods disclosed in Examples of the invention. The cloningmethod may comprise the steps of isolating and restrictively digesting anucleic acid fragment comprising the nucleic acid sequence encoding theprotein of interest, inserting said fragment to a vector, andincorporating the vector into a host cell, whereby copies or clones ofsaid nucleic acid sequence are duplicated in said host cell. However, itmay be easier to synthesize the sequence with an automatic nucleotidesynthesizer (such as ABI394 DNA synthesizer of Applied Biosystems)according to the nucleotide sequence disclosed herein, or to synthesizeseparately the fragments of the nucleic acid sequence and to ligate thefragments into a full-length sequence with conventional ligases andvectors using the method of Chinese Patent application 99103472.4, whichis disclosed on Oct. 4, 2000.

The nucleic acid sequence of the invention may be genomic, cDNA, RNA,semi-synthesized, completely synthesized sequence, or any combinationthereof.

Another aspect of the invention relates to an isolated nucleic acidsequence encoding a protein with EPSPS activity and glyphosatetolerance, wherein the protein comprises the amino acid sequence asshown in SEQ ID NO:3.

The invention further relates to an isolated nucleic acid sequenceencoding a protein with EPSPS activity and glyphosate tolerance, whereinthe amino acid sequence of the protein comprises substitution, deletionand/or addition of one or more amino acid residues in the amino acidsequence of SEQ ID NO:3, while the EPSPS activity and glyphosatetolerance are remained. Said substitution, deletion and/or addition ofone or more amino acid residues are within the conventional technique inthe art. Such a change of amino acids is preferably a minor change offeatures which is a conserved amino acid substitution without prominentinfluence to the folding and/or activity of the protein; a minordeletion of generally about 1-30 amino acids; a minor extension at aminoterminus or carboxyl terminus, such as an extension of one methionineresidue at the amino terminus; a minor linker peptide in length of, forexample about 20-25 residues.

Examples of conserved substitutions are those occured within thefollowing amino acid groups: basic amino acids (e.g. Arg, Lys and His),acidic amino acids (e.g. Glu and Asp), polar amino acids (e.g. Gin andAsn), hydrophobic amino acids (e.g. Leu, Ile and Val), aromatic aminoacids (e.g. Phe, Try and Tyr), and small molecular amino acids (e.g.Gly, Ala, Ser, Thr, and Met). Amino acid substitutions which usually donot change a particular activity are known in the art, and have beendescribed by N. Neurath and R. L. Hill, Protein, published by New YorkAcademic Press, 1979. The most common substitutions are Ala/Ser,Val/Ile, Asp/Glu, Thr/Ser, Ala/Gly, Ala/Thr, Ser/Asn, Ala/Val, Ser/Gly,Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/Ile, Leu/Val, Ala/Glu, andAsp/Gly, and reverser substitutions thereof.

To those skilled in the art, it is obvious that such a substitution mayoccur at regions other than those important for function but produce anactive polypeptide. To the polypeptide encoded by the isolated nucleicacid sequence of the invention, the amino acid residues which isimportant for function and which is thus selected not to be substituted,may be identified by methods known in the art, for example site-directedmutagenesis or alanine scanning mutagenesis (see, e.g. Cunningham andWells, 1989, Science 244: 1081-1085). The latter technique comprisesintroducing a mutation into each positive-charged residue in themolecule, and determining the glyphosate-tolerant EPSPS activity of themutated molecule, thereby determining the amino acid residue importantfor the activity. The site of substrate-enzyme interaction can also bedetermined by analysis of the 3D structure, which may be determined bytechniques such as nuclear magnetic resonance, crystallography orlight-affinity label (see, e.g. de Vos et al, 1992, Science 255:306-312; Smith et al, 1992, J. Mol. Biol 224: 899-904; Wlodaver et al,1992, FEBS Letters 309: 59-64).

The invention also relates to an amino acid sequence of the homology,such as the homology of at least about 50%, at least about 60%, at leastabout 65%, preferably at least about 66%, preferably at least about 67%,preferably at least about 68%, preferably at least about 69%, preferablyat least about 70%, preferably at least about 71%, preferably at leastabout 72%, preferably at least about 73%, preferably at least about 74%,preferably at least about 75%, preferably at least about 76%, preferablyat least about 77%, preferably at least about 78%, preferably at leastabout 79%, more preferably at least about 80%, more preferably at leastabout 81%, more preferably at least about 82%, more preferably at leastabout 83%, more preferably at least about 84%, more preferably at leastabout 85%, more preferably at least about 86%, more preferably at leastabout 87%, more preferably at least about 88%, more preferably at leastabout 89%, more preferably at least about 90%, more preferably at leastabout 91%, more preferably at least about 92%, more preferably at leastabout 93%, more preferably at least about 94%, more preferably at leastabout 95%, more preferably at least about 96%, more preferably at leastabout 97%, more preferably at least about 98%, most preferably at leastabout 99%, with the amino acid sequence as shown in SEQ ID NO:3. Such asequence is within the scope of the invention provided that a proteincomprising said homologous sequence has the glyphosate-tolerant EPSPSactivity.

The protein encoded by the isolated nucleic acid sequence according tothe invention has at least 20%, preferably at least 40%, more preferablyat least 60%, even more preferably at least 80%, even more preferably atleast 90%, most preferably at least 100% of the EPSPS activity of theamino acid sequence as shown in SEQ ID NO:3.

The invention also relates to a nucleic acid construct comprising theabove-defined nucleic acid sequence, or its coding sequence (e.g.nucleotides 574-1803 of SEQ ID NO:2).

The nucleic acid construct of the invention further comprises a controlsequence essential to the expression of aforesaid sequence in a selectedhost cell. The control sequence is operably linked to the aforesaidisolated nucleic acid sequence in the nucleic acid construct.

The control sequence may be a promoter, including a transcriptionalcontrol sequence directing the expression of a polypeptide. The promotermay be any nucleic acid sequence having transcriptional activity in theselected cell, such as a mutated, truncated, or heterozygous promoter.Such a promoter may be obtained from a gene encoding an extracellular orintracellular peptide. Such a polypeptide may be or may not behomologous to the cell. Various promoters used in prokaryotic cells areknown in the art.

The control sequence may also be an appropriate transcription terminatorsequence, a sequence recognized by a selected host cell to terminatetranscription. The terminator sequence is operably linked to the 3′ endof the nucleic acid sequence coding for polypeptide. Any terminatorwhich is functional in the selected host cell may be used in theinvention.

The control sequence may also be an appropriate leader sequence, anon-translated region of an mRNA important for the translation in cell.The leader sequence is operably linked to the 5′ end of the nucleic acidsequence coding for polypeptide. Any leader sequence which is functionalin the selected host cell may be used in the invention.

The control sequence may also be a polyadenylation sequence. Thesequence is operably linked to the 3′ end of the nucleic acid sequence,and when transcribed, is recognized by the cell as a signal to addpolyadenosine residue to transcribed MRNA. Any polyadenylation sequencewhich is functional in the selected host cell may be used in theinvention.

The nucleic acid construct may further comprise one or more nucleic acidsequences which encode one or more factors useful in directing theexpression of a foreign polypeptide, such as a transcription-activatingfactor (e.g. a trans-acting factor), a partner protein and a processingprotein. Any factor effective in host cells, in particular bacterialcells and plant cells may be used in the invention. The nucleic acidencoding one or more such factors is not always linked in tandem withthe nucleic acid encoding a foreign polypeptide.

The nucleic acids and control sequences described above may be joined ina conventional vector, such as a plasmid or a virus, to produce a“recombinant expression vector” according to the invention, using amethod well-known to the skilled in the art (see J. Sambrook, E. F.Fritsch and T. Maniatus, 1989, Molecullar Cloning, laboratory manual,2^(th) ed, Cold Spring, N.Y.). The vector may have one or moreconvenient restriction sites. The choice of vector usually depends onthe compatibility of a vector and the host cell being used. The vectormay be linear or in close circle and it may be autonomously replicableas an extrachromosomal entity, whose replication is independent from thechromosome replication, e.g. a plasmid (an extrachromosomal element), aminichromosome, or an artificial chromosome. The vector may comprise anymeans for ensuring self-replication. Alternatively, the vector isintegrated into the genome and replicated with the chromosome afterbeing introduced into the cell. The vector system may be a single vectoror plasmid, or two or more vectors or plasmids (altogether contain thenucleic acid sequence of interest), or a transposon.

For integration into the host cell genome, the vector may compriseadditional nucleic acid sequences that direct the integration of saidvector into the genome through homologous recombination. The additionalnucleic acid sequences enable said vector to integrate into the genomeat a precise position. To increase the possibility of integration into aprecise position, the integration elements should preferably comprise asufficient number of nucleic acids, for example 100-1500 base pairs,preferably 400-1500 base pairs, most preferably 800-1500 base pairs,which are highly homologous to the corresponding target sequencesthereof to increase the possibility of homologous integration. Saidintegration element may be any sequence that is homologous to the targetsequence in the genome of the cell. Moreover, said integration elementsmay be non-coding or coding nucleic acid sequences. Alternatively, saidvector may integrate into the genome of the cell through non-homologousintegration.

In condition of autonomous replication, the vector may further containan origin of replication enabling said vector to replicate autonomouslyin bacterial cells and plant cells.

The invention also relates to a recombinant “host cell” comprising thenucleic acid sequence of the invention. The nucleic acid construct orvector comprising the nucleic acid sequence of the invention may beintroduced into the host cell, so that the nucleic acid sequence of theinvention is integrated into the chromosome or the vector isautonomously replicated, whereby the nucleic acid sequence of theinvention is expressed stably by the host cell and makes the host cellglyphosate-tolerant.

The host cell may be a prokaryotic cell such as a bacterial cell, butmore preferably a eukaryotic cell such as a plant cell.

The common bacterial cells include the cells of Gram positive bacteriasuch as Bacillus, or the cells of Gram negative bacteria such asEscherichia or Pseudomonas. In a preferable embodiment, the bacterialhost cell is a cell of E.coli.

The introduction of a expression vector into a bacterial host cell maybe achieved by protoplast transformation (see Chang and Cohen, 1979,General Molecular Genetics 168: 111 -115), using competent cells (seeYoung and Spizizin, 1961, J. Bacteriol. 81: 823-829, or Dubnau andDavidoff-Abelson, 1971, J. Mol. Biol. 56: 209-221), electroporation (seeShigekawa and Dower, 1988, Biotech. 6: 742-751), or conjugation (seeKoehler and Thome, 1987, J. Bacteriol. 169: 5771-5278).

DESCRIPTION OF FIGURES

FIG. 1 shows the map of plasmid pKU2004.

FIG. 2 shows the amino acid sequence alignment between EPSPS ofPseudomonas putida P. P4G-1 with various known EPSPSs. Sequences shownin box and shade are claimed in previous patents.

FIG. 3 shows the growth curve of E.coli XL1 -BLUE MR in differentglyphosate concentrations. Said E.coli XL1-BLUE MR carries differentEPSPS genes.

EXAMPLES Example 1 Isolation of Gylphosate-Tolerant Strains

The sample were taken from neighborhood of a glyphosate producingfactory in Hebei Province, China, and were diluted and spread on mediumscomprising glyphosate. A total of 48 strains are isolated with hightolerance and degradation ability to glyphosate. One strain of them,4G-1, is able to grow on a medium with 400 mM glyphosate, and isresistant to 100 mg/L Ampicillin. Said strain is selected for furtherstudies.

Example 2 Identification of Glyphosate-Tolerant Strains

a) Mini-Prep of the Total DNA of Strain 4G-1

Strain 4G-1 is inoculated in 3 ml of LB liquid medium with 100 mg/LAmpicillin, and cultured at 28° C. overnight while shaking. The cultureis centrifuged at 12000 rpm and the pellet is resuspended in 0.5 ml ofSolution I (10 mM NaCl, 20 mM Tris-Hcl pH 8.0, 1 mM EDTA). Protease K(Merck, Germany) and SDS are added to a final concentration of 10 μg/mland 0.5%, respectively. The suspension is then mixed by carefulreversion, and then left at 50° C. for more than 6 hrs. An equal volumeof phenol is added. The mixture is carefully reversed and left at roomtemperature for 10 min. Then, the mixture is centrifuged at 12000 rpm atroom temperature for 5 min. The supematant aqueous phase is drawn outwith tips (Axy Gen, USA), and the pellet is reextracted with equalvolume of phenol/chloroform. The supematant is added with 10% of 3M NaACand 2.3 volume of ethanol for precipitation. The mixture is centrifugedat 12000 rpm at −10° C. for 25 min. After the supematant is discarded,the precipitate is washed with 500 μl of 70% ethanol and centrifuge at12000 rpm for 1 min. After the supernatant is drawn off completely, theprecipitate is dried in Savant for 20 min or in the incubator at 37° C.for 1 hr. The precipitate is added with 100 μl of TE solution (10 mMTris-Cl, 1 mM EDTA, pH 8.0) for solubilization, then frozen at −20° C.for further studies.

b) Cloning of 16S rRNA of Strain 4G-1

A pair of universal primers of 16S rRNA (primer 1: 5′ AGAGTTGACATGGCTCAG 3′ and primer 2: 5′ TACGGTTACCTTGTTACGACTT 3′) issynthesized. The PCR amplification reaction is run in the Robocycler 40(Stratagene) using the primers. The reaction system is: 1 μl of totalDNA of strain 4G-1 as template, 5 μl of buffer, 4 μl of 10 μmol dNTP, 1μl of 20 pmol/μl primer 1, 1 μl of 20 pmol/μl primer 2 and 37 μl ofdeionized water. The reaction condition is: 94° C. 10 min, with 1 μl of5U Pyrobest Taq DNA polymerase added, then 94° C. 1 min, 50° C. 1 min,72° C. 2 min for 30 circles, and finally extended at 72° C. for another10 min. A fragment of about 1.5 kb is obtained. The PCR product ispurified according to the method provide by the manufacturer of the PCRproduct purification kit, Boehringer.

The purified PCR product is subjected to poly-A (deoxyadenosine)reaction. The reaction system is: 20 μl (2 μg) of purified PCR product,5 μl of buffer, 1 μl (5U) of Taq DNA polymerase (DingGuo Ltd, Beijing),4 μl of 5 μmol dATP and 20 μl of deionized water.

The resulting products are purified by a purification method for PCRproduct, and are ligated to a T vector (Takara, Dalian) according to theinstruction of manufacturer Takara to obtain plasmid pKU2000. The resultof sequencing shows in SEQ ID NO: 1. Using BLAST software and BLASTP2.2.2 [Dec. 14-2001] database for sequence alignment in the AmericanNational Center for Biotechnology Information (NCBI), it is found thatsaid sequence is 99% homologous to the nucleotide sequence of 16S rRNAof Pseudomonas putida. Thus, strain 4G-1 is thought to be P. putida, andis designated as P. putida 4G-1 (abbreviated as P. P4G-1). Said strainis deposited on Apr. 30, 2002 at the Chinese General MicrobiologicalCulture Centre (CGMCC) (ZhongGuan Cun, Beijing, China) with theaccession number CGMCC 0739.

Example 3 Construction of the Genomic Library of Strain 4G-1

a) Maxi-prep of the Total DNA of 4G-1

The strain P. P4G-1 is inoculated into 100 ml of LB medium (supplementedwith 100 mg/L Ampicillin) in a 250 ml flask, and cultured at 28° C.overnight while shaking at 200 rpm. The culture is centrifuged at 8000rpm for 5 min, then the pellet is resuspended in 14 ml of Solution I.Protease K (Merck, Germany) and SDS are added to a final concentrationof 10 μg/ml and 0.5%, respectively. The mixture is mixed by carefulreversion and left at 50° C. for more than 6 hr. An equal volume ofphenol is added. The mixture is carefully reverted and left at roomtemperature for 10 min. The mixture is centrifuged in room temperatureat 4000 rpm for 20 min. The supematant aqueous phase is-drawn withwide-end tips, and extracted with equal volume of phenol/chloroform. Thesupematant is added with 10% of 3M NaAC (pH 5.5) and 2.3 volume ofethanol for precipitation. The DNA is carefully picked out using a glassrod and washed in 70% ethanol. After ethanol is discarded, the DNA isdried. 2 ml of TE solution (pH 8.0) at 4° C. is added for solubilizationfor 24 hr and about 1 mg of total DNA is obtained. The DNA fragment isidentified to be more than 80 Kb using 0.3% agarose gel electrophoresis.

b) Recovery of the Digestive Products of Total DNA

200 μl of total DNA (100 μg) is digested with 5U of restrictionendonuclease Sau3AI at room temperature for 20 min, 30 min and 45 min,respectively. The products are combined and added with EDTA to a finalconcentration of 0.25 mM, and then extracted with equal volume ofphenol/chloroform. The supematant is added with 10% of 3M NaAC and 2.3volume of ethanol for precipitation. The precipitate is washed with 70%ethanol, and dried as mentioned above. Then the precipitate issolubilized in 200 μl of TE, and loaded on 12 ml of 10-40% sucrosedensity gradient in a Beckman sw28 ultra centrifuge tube. The samples isplace in the Beckman sw28 roter at 20° C. and centrifuged at 120000 gfor 18 hrs. Each fraction is collected from the top (0.5 ml) and 15 μlaliquot is analyzed by 0.3% agarose gel electrophoresis. The bandscomprising the DNA of 30-40 kb are combined, added with about 2 volumesof deionized water and 7 volumes of ethanol for precipitated at −20° C.overnight, then washed with 70% ethanol, dry and solubilized in 50 μl ofTE.

c) Construction of Genomic Library of 4G-1

SuperCos1 cosmid vector is digested with Xba I, alkaline phosphatase andBamHI, then ligated with the isolated total DNA fragments describedabove, using the method of Stratagne.

The package extract of Gigapack III Gold is used to pack a ligationproduct in vitro. The library is titrated with a Stratagene kit onE.coli XL1-Blue MR (Stratagene), using the method of Stratagne. Theobtained library is then amplified and stored according to theinstruction of the same manufacturer.

Example 4 Isolation, Screening, Sequencing and Analysis of aGlyphosate-Tolerant EPSPS Gene

a) Screen of the Glyphosate-Tolerant Gene Library

1 ml stock solution of the library described above is centrifuged, thesupematant is discarded and the pellet is resuspend in 1 ml sterilesaline. Centrifuge is repeated again and the supematant is discarded,the pellet is resuspend in 1 ml sterile saline. The suspension isspreaded onto the 10 mM glyphosate-50 mg/L Ampicillin plate (20 mMammonium sulfate; 0.4% glucose; 10 mM glyphosate; 0.5 mM dipotassiumhydrogen phosphate; 0.1 mg/L ferric sulfate; 0.5 g/L magnesium sulfate;0.5 g/L calcium chloride; 2.1 g/L sodium chloride; 50 mM Tris (pH 7.2);5 mg/L Vitamin B1; 15 g/L agarose) in a density of about 10³bacteria/plate. The plate is cultured overnight at 37° C. One strain isobtained and designated as BDS. The cosmid carried by the strain isdesignated as pKU2001.

b) Isolation of the Glyphosate-Tolerant Gene

The strain BDS is inoculated in 20 ml of LB (supplemented with 50 mg/LAmpicillin) in a 50 ml flask, and cultured at 37° C. for 12 hr whileshaking at 300 rpm. The strain is collected after centrifugation. Theplasmid pKU2001 is extracted by an alkaline method according toMolecullar cloning, laboratory manual, supra. This plasmid is thentransformed into E.coli XL1-Blue MR (Stratagene) along with a cosmidvector. The strains are streaked on 10 mM glyphosate plate. Those merelycarrying the empty cosmid vector is not able to grown on the glyphosatemedium, while those carrying pKU2001 grows well, indicating that pKU2001carries the glyphosate-tolerant gene.

The pKU2001 is digested with Sau3AI. A DNA fragment of 2-4 kb isrecovered using 0.7% agarose gel electrophoresis, and ligated to theBamH I-digested and dephosphorylated pUC18 vector (Yanisch-Perron, C.,Vieria, J. and Messing, J. 1985, Gene 33:103-119). The ligation istransformed into E.coli XL1-Blue MR (Stratagene) and the strain isstreaked on the 10 mM glyphosate plate supplemented with 50 mg/LAmpicillin. The plates are cultured overnight at 37° C. and dozens ofclones are obtained. The clones are picked up and plasmids areextracted. A plasmid carrying an exogenous fragment of about 2 kb isscreened out and designated as pKU2002. The plasmid pKU2002 and theempty vector pUC18 are transformed into E.coli XL1-Blue MR (Stratagene)respectively. Strains are streaked on the 10 mM glyphosate plates with50 μg/ml Ampicillin. Those carrying pUC18 empty vector are not able togrow on the glyphosate plate, while those carrying pKU2002 plasmid growswell, indicating that the plasmid pKU2002 carries glyphosate resistantgene.

c) pKU2002 is sequenced, and a full-length sequence of 1914 bp isobtained as shown in SEQ ID NO:2.

d) pKU2002 is subjected to sequence analysis using DNASIS software. Theunique possible open reading frame (ORF, nucleotides 574-1803 of SEQ IDNO:2) is determined. The amino acid sequence encoded is shown in SEQ IDNO:3.

e) The protein sequence is subjected to BLAST search in the GenBankprotein sequence database of American National Center for BiotechnologyInformation (NCBI). It is found that the sequence of the protein is 37%homologous to the amino acid sequence of EPSPS of Clostridiumacetobutylicum, and is 20% homologous to the amino acid sequence ofEPSPS of E.coli. The 1230 bp sequence is thought to encode an EPSPsynthase, and the gene is designated as pparoA. Analysis shows that saidgene does not belong to any class of EPSPS, it is a novel EPSPS gene(Class III). The EPSPS amino acid sequence alignment of E.coli,Clostridium acetobutylicum and P. P4G-1 is shown in FIG. 2.

f) A pair of primers is designed comprising a BamH I site shown asunderlined Primer 3: 5′-CGGGATCCTAAGTAAGTGAAAGTAACAATACAGC-3′ Primer 4:5′-CGGGATCCCTTCTTCGGACAATGACAGAC-3′

The PCR amplification is run with pKU2001 the template. The amplifiedfragment is digested with BamH I and inserted into pUC18 to obtainplasmid pKU2003. Sequencing shows that no mismatching base isintroduced. pKU2003 is digested with BamH I and ligated into the BamH Isite of pACYC184 in a forward direction (Chang, A. C. Y., and Cohen, S.N., 1978, J. Bacteriol. 134: 1141-1156) to obtain plasmid pKU2004, themap of which is shown in FIG. 1. The transcription of pparoA gene inthis plasmid is initiated by the promoter Tc^(r) derived from pACYC 184.

Example 5

Cloning of E.coli aroA Gene and Site-Directed Mutagenesis of itsGlyphosate Tolerance (Control Test)

The E.coli ET8000 (MacNeil, T., MacNeil, D., and Tyler, B. 1982 J.Bacteriol. 150: 1302-1313) is inoculated into 3 ml of LB liquid mediumin a 15 ml tube, and cultured with shaking at 37° C. overnight. Thestrain is centrifuged, and total DNA is extracted according to themethod described above.

A pair of primers is designed to include a BamH I site shown asunderlined Primer 5: 5′-CGGGATCCGTTAATGCCGAAATTTTGCTTAATC-3′ Primer 6:5′-CGGGATCCAGGTCCGAAAAAAAACGCCGAC-3′

The E.coli aroA gene which encodes EPSPS protein in E.coli is obtainedby amplification using the total DNA of E.coli as template. Said gene isdigested with BamH I and inserted into pUC18 to obtain the plasmidpKU2005. The sequence of the plasmid is analyzed and SEQ ID NO:10 isobtained. The sequence is proved to be correct after alignment with theknown EPSPS gene sequence of E.coli in the GenBank data of NCBI. Afterdigesting the plasmid pKU2005 with BamH I the small fragment isrecovered and inserted in forward direction into the BamH I site ofpACYC 184, and the plasmid pKU2006 is obtained.

The aroA gene of E.coli is subjected to site mutation. The Guanine onsite 287 is mutated to Cytosine. Then the Glycine on site 96 of theE.coli EPSPS protein is mutated to Alanine. Similarly, said genefragment is inserted into the BamHI site of pACYC184 to obtain plasmidpKU2007.

Example 6 The EPSPS Function-Complementation Experiment of E.coli aroA⁻Strain

pACYC184, pKU2004, pKU2006 and pKU2007 are transformed into E.coliAB2889 (E.coli aroA⁻ strain, from Yale University) respectively. Theyare streaked on M63 medium (13.6 g/L KH₂PO₄, 0.5 mg/L FeSO₄-7H₂O, 20 mM(NH₄)₂SO₄, 0.4% glucose, 1 mM magnesium sulfate, 0.5 mg/L Vitamin B1)comprising chloroamphenicol in a final concentration of 25 mg/L forculture. The results are shown in Table 1.

The aAAS components are supplemented as follows: 100 mg/L Phenylanine100 mg/L Tyrosine 100 mg/L Tryptophane 5 mg/L p-aminobenzoic acid 5 mg/L2,3-dihydroxybenzoic acid 5 mg/L p-hydroxybenzoic acid

TABLE 1 The experiments of EPSPS function-complementation and glyphosatetolerance of aroA-deficient E. coli strain the plasma EPSPSfunction-complementation and glyphosate tolerance carried M63 medium 10mM glyphosate by AB2889 M63 medium (supp. aAAS) tolerance pACYC184 − + −PKU2006 + + − pKU2007 + + + pKU2004 + + +

At the same time, the growth curves of the strains are measured inliquid culture condition. The results show that, as same as the controlaroA gene of E.coli (pKU2006), the gene carried by pKU2004 is able tocompletely complement the EPSPS function of aroA deficient E.coliAB2899, suggesting that the 1230-bp nucleic acid sequence carried bysaid plasmid is a EPSPS encoding gene, and the EPSPS encoded by saidgene has glyphosate tolerance.

Example 7 The Glyphosate Tolerance of the Novel EPSPS Gene

The plasmids pKU2004, pKU2006 and pKU2007 are transformed into theE.coli XL1-Blue MR, respectively. Stains are inoculated and culturedovernight on M63 mediums separately, and then transferred to M63 mediumssupplemented with different concentrations of glyphosate. The growthcurves are measured. The results show that the E.coli transformed withpKU2006 is inhibited significantly when growing in the 5 mM glyphosatemedium, and does not grow in the 40 mM glyphosate medium. In constrast,the E.coli transformed with pKU2004 and pKU2006 are not inhibitedobviously when growing in the 40 mM glyphosate medium, and the E.colitransformed with pKU2004 grows well in 120 mM glyphosate medium (FIG. 3:growth curve).

1. An isolated nucleic acid sequence encoding a glyphosate tolerant5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), wherein thesynthase is characterized by that it: (i) comprises the amino acidsequence shown in SEQ ID NO:3, or (ii) is the glyphosate tolerant5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) which is modified bysubstitution, deletion and/or addition of one or more amino acids to theamino acid sequence of (i).
 2. The isolated nucleic acid sequenceaccording to claim 1, characterized by that it: (i) comprises thenucleotide sequence shown in SEQ ID NO:2, and preferably comprises thesequence shown in nucleotides 574-1803 of SEQ ID NO:2, or (ii) comprisesthe nucleotide sequence which is modified by substitution of one or morenucleotides, or the deletion and/or addition of three or a multiple ofthree nucleotides to the nucleotide sequence of (i), and the proteinencoded by said nucleotide sequence has the activity of glyphosatetolerant 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) and theglyphosate tolerance.
 3. The protein encoded by the isolated nucleicacid sequence according to claim 1, which has the activity ofglyphosate-tolerant 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS)and the glyphosate tolerance.
 4. A nucleic acid construct, whichcomprises the isolated nucleic acid sequence according to claim
 1. 5. Avector, which carries the isolated nucleic acid sequence according toclaim
 1. 6. A host cell, which is transformed by the nucleic acidconstruct according to claim
 4. 7. A host cell or progeny cell hereof,wherein said cell contains the isolated nucleic acid sequence accordingto claim 1 and has the glyphosate tolerance.
 8. A host cell or progenycell hereof, wherein said cell contains the isolated nucleic acidsequence according to claim 2 and has the glyphosate tolerance.
 9. Amethod for preparing a host cell, comprising operably linking thenucleic acid with appropriate control sequences, introducing them intoan appropriate vector, introducing said vector into a selected hostcell, and expressing an active glyphosate tolerant5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) by the nucleic acidsequence.
 10. A glyphosate tolerant 5-enolpyruvylshikimate-3-phosphatesynthase (EPSPS), comprising: (i) the amino acid sequence shown in SEQID NO:3, or (ii) the amino acid sequence which is modified by thesubstitution, deletion and/or addition of one or more amino acids to theamino acid sequence of (i).
 11. The protein encoded by the isolatednucleic acid sequence according to claim 2, which has the activity ofglyphosate-tolerant 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS)and the glyphosate tolerance.
 12. A nucleic acid construct, whichcomprises the isolated nucleic acid sequence according to claim
 2. 13. Avector, which carries the isolated nucleic acid sequence according toclaim
 2. 14. A vector, which carries the nucleic acid constructaccording to claim
 4. 15. A vector, which carries the nucleic acidconstruct according to claim
 12. 16. A host cell, which is transformedby the nucleic acid construct according to claim
 12. 17. A host cell,which is transformed by the nucleic acid construct according to claim 5.18. A host cell, which is transformed by the nucleic acid constructaccording to claim
 13. 19. A host cell, which is transformed by thenucleic acid construct according to claim
 14. 20. A host cell, which istransformed by the nucleic acid construct according to claim 15.